An adaptive rough fuzzy single pass algorithm for clustering large data sets

نویسندگان

  • S. Asharaf
  • M. Narasimha Murty
چکیده

Cluster analysis has been widely applied in many areas such as data mining, geographical data processing, medicine, classi-cation of statistical -ndings in social studies and so on. Most of these domains deal with massive collections of data. Hence the methods to handle them must be e/cient both in terms of the number of data set scans and memory usage. Several algorithms have been proposed in the literature for clustering large data sets viz; CLARANS [1], DB-SCAN [1], CURE [1], K-Means [2], etc. Most of these require more than one pass through the data set to -nd the required abstraction. Hence they are computationally expensive for the clustering of large data sets. Even though we have a single pass clustering algorithm called BIRCH [1], it uses a memory expensive data structure called CF tree. In this scenario the Leader algorithm [3], which requires only a single data set scan and less memory, turns out to be a potential candidate. This paper introduces an e/cient variant of leader algorithm called Adaptive Rough Fuzzy Leader (ARFL) algorithm which out-performs the conventional leader algorithm. It employs a combination of Rough Set theory [4] and Fuzzy set theory [5] to capture the intrinsic uncertainty involved in cluster analysis. The paper is organized as follows. Section 2 discuss the conventional Leader algorithm, Section 3 introduces the proposed algorithm, in Section 4 a comparative study of the algorithm with conventional leader algorithm and single pass K-means algorithm is given and Section 5 deals with conclusions.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Interval set clustering of web users using modified Kohonen self-organizing maps based on the properties of rough sets

Web usage mining involves application of data mining techniques to discover usage patterns from the web data. Clustering is one of the important functions in web usage mining. The likelihood of bad or incomplete web usage data is higher than the conventional applications. The clusters and associations in web usage mining do not necessarily have crisp boundaries. Researchers have studied the pos...

متن کامل

A Framework for Optimal Attribute Evaluation and Selection in Hesitant Fuzzy Environment Based on Enhanced Ordered Weighted Entropy Approach for Medical Dataset

Background: In this paper, a generic hesitant fuzzy set (HFS) model for clustering various ECG beats according to weights of attributes is proposed. A comprehensive review of the electrocardiogram signal classification and segmentation methodologies indicates that algorithms which are able to effectively handle the nonstationary and uncertainty of the signals should be used for ECG analysis. Ex...

متن کامل

Diagnosis of the disease using an ant colony gene selection method based on information gain ratio using fuzzy rough sets

With the advancement of metagenome data mining science has become focused on microarrays. Microarrays are datasets with a large number of genes that are usually irrelevant to the output class; hence, the process of gene selection or feature selection is essential. So, it follows that you can remove redundant genes and increase the speed and accuracy of classification. After applying the gene se...

متن کامل

A Multi-Objective Approach to Fuzzy Clustering using ITLBO Algorithm

Data clustering is one of the most important areas of research in data mining and knowledge discovery. Recent research in this area has shown that the best clustering results can be achieved using multi-objective methods. In other words, assuming more than one criterion as objective functions for clustering data can measurably increase the quality of clustering. In this study, a model with two ...

متن کامل

Enforcement of rough fuzzy clustering based on correlation analysis

Clustering is a standard approach in analysis of data and construction of separated similar groups. The most widely used robust soft clustering methods are fuzzy, rough and rough fuzzy clustering. The prominent feature of soft clustering leads to combine the rough and fuzzy sets. The Rough Fuzzy C-Means (RFCM) includes the lower and boundary estimation of rough sets, and fuzzy membership of fuz...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Pattern Recognition

دوره 36  شماره 

صفحات  -

تاریخ انتشار 2003